Skip to content

Conversation

@graphemecluster
Copy link
Contributor

  • Removes the incorrect errors on individual &s (not &&s) at certain positions
  • Fixes a related issue that sometimes - (class set range operator) at the end of class sets are not flagged as errors
  • Improves error recovery and makes diagnostic messages more intuitive and user-friendly when there are consecutive &s or -s, when operators are mixed within the same class, or when there are consecutive operands in class intersections and subtractions

Fixes #62707

… in Unicode Sets Mode

- Removes the incorrect errors on individual `&`s (not `&&`s) at certain positions
- Flags `-`s at the beginning and the end of class sets
- Improves error recovery and makes diagnostic messages more intuitive and user-friendly
  when there are consecutive `&`s or `-`s or when operators are mixed within the same class
Copilot AI review requested due to automatic review settings November 4, 2025 06:53
@github-project-automation github-project-automation bot moved this to Not started in PR Backlog Nov 4, 2025
@typescript-bot typescript-bot added the For Backlog Bug PRs that fix a backlog bug label Nov 4, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request improves the error recovery and diagnostic messages for regular expression Unicode sets mode (/v flag) in TypeScript. The changes focus on handling complex operator combinations (&& for intersection and -- for subtraction) in character classes, providing more user-friendly error messages and better error recovery for malformed regex patterns.

Key changes:

  • Enhanced scanner logic to better detect and report errors in Unicode sets mode regular expressions
  • Added two new diagnostic messages (error codes 1547 and 1548) for clearer error reporting
  • Comprehensive test coverage for various edge cases involving &, -, &&, and -- operators

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/cases/compiler/regularExpressionUnicodeSets.ts New comprehensive test file covering 188 regex patterns with various operator combinations
tests/baselines/reference/regularExpressionUnicodeSets.types Baseline file capturing type information for the test cases
tests/baselines/reference/regularExpressionUnicodeSets.symbols Baseline file capturing symbol information for the test cases
tests/baselines/reference/regularExpressionUnicodeSets.js Baseline file showing compiled JavaScript output
tests/baselines/reference/regularExpressionUnicodeSets.errors.txt Baseline file documenting the 226 expected error messages
tests/baselines/reference/regularExpressionScanning(target=esnext).errors.txt Updated baseline to reflect new error messages
tests/baselines/reference/regularExpressionScanning(target=es5).errors.txt Updated baseline to reflect new error messages
tests/baselines/reference/regularExpressionScanning(target=es2015).errors.txt Updated baseline to reflect new error messages
src/compiler/scanner.ts Enhanced regex scanner with improved error recovery for Unicode sets mode
src/compiler/diagnosticMessages.json Added two new error messages (codes 1547 and 1548)

Comment on lines +3101 to +3111
// If we see '-', parse the expression as ClassUnion.
// An exception is made for '-&&', in which case it is more intuitive to parse as ClassIntersection and scan '-' as an invalid operand
else if (
ch === CharacterCodes.minus && (
charCodeChecked(pos + 1) !== CharacterCodes.ampersand || charCodeChecked(pos + 2) !== CharacterCodes.ampersand
)
) {
// '-' can't be an operand in a ClassUnion (a class in Unicode sets mode can't start with '-')
error(Diagnostics.Incomplete_character_class_range_Expected_a_class_set_character);
mayContainStrings = false;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was struggling whether to treat - as an unexpected character ("Unexpected '-'. Did you mean to escape it with backslash?") or an incomplete character class with a zero-width message in cases like /[-a-b-]/v

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And temporarily went with the former.

Comment on lines +1852 to +1859
"Incomplete character class range. Expected a class set character.": {
"category": "Error",
"code": 1547
},
"Missing '{0}' in between class set operands. To form a union of these characters, wrap them in a nested class instead.": {
"category": "Error",
"code": 1548
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate any better suggestions regarding the wording of the messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if old diagnostic messages are amendable. Precisely, shall I alter the message of TS1519 from "Wrap it" to "Wrap them"?

Comment on lines +3205 to +3209
if (ch === charCodeChecked(pos + 1) && (ch === CharacterCodes.minus || ch === CharacterCodes.ampersand)) {
error(Diagnostics.Operators_must_not_be_mixed_within_a_character_class_Wrap_it_in_a_nested_class_instead, pos, 2);
pos += 2;
operand = text.slice(start, pos);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be even more intuitive if we were to rescan the class set as intersection/subtraction when we hit any -- or &&, but it is infeasible without a two-pass scan, the first of which must not emit errors. Unfortunately, I need to give up as it would overcomplicate the implementation.

operand = scanClassSetOperand();
}
// Use the first operator to determine the expression type
switch (charCodeChecked(pos)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might not have functional difference right now but to future-proof this I'd still include this:

Suggested change
switch (charCodeChecked(pos)) {
switch (ch = charCodeChecked(pos)) {

The reason why this should be reassigned here is that the above call to scanClassSetOperand could move the pos cursor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

For Backlog Bug PRs that fix a backlog bug

Projects

Status: Not started

Development

Successfully merging this pull request may close these issues.

TS1508: Unexpected '?'. Did you mean to escape it with backslash? shouldn't report even with v flag

3 participants